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r) , Abstract. We prove a general lower bound on the average-case com- 

plexity of Shellsort: the average number of data-movements (and com- 
parisons) made by a p-pass Shellsort for any incremental sequence is 
n{pn}'^^''') for every p. The proof method is an incompressibility ar- 
gument based on Kolmogorov complexity. Using similar techniques, the 
, ^ average-case complexity of several other sorting algorithms is analyzed. 
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^D . The question of a nontrivial general lower bound (or upper bound) on the average 

complexity of Shellsort (due to D.L. Shell [g5|) has been open for about four 
decades [0,|lj| • We present such a lower bound for p-pass Shellsort for every p. 
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1 Introduction 



Y^ ' Shellsort sorts a list of n elements in p passes using a sequence of increments 

hi, . . . ,hp. In the fcth pass the main list is divided in hk separate sublists of 
length \n/hk\ , where the ith sublist consists of the elements at positions j, where 
j mod hk — i—l, oi the main hst {i — 1, . . . ,hk)- Every subhst is sorted using a 

^H ' straightforward insertion sort. The efficiency of the method is governed by the 

number of passes p and the selected increment sequence hi, . . . ,hp with hp = 1 
to ensure sortedness of the final list. The original log n-pass [^ increment sequence 
[71/2], [n/4j , . . . , 1 of Shell ||l5[ uses worst case 0{n^) time, but Papernov and 
Stasevitch jl^l showed that another related sequence uses 0(n^/^) and Pratt 
[ p!2| extended this to a class of all nearly geometric increment sequences and 
proved this bound was tight. The currently best asymptotic method was found 
by Pratt |1^. It uses all log n increments of the form 2'3-' < [n/2\ to obtain 
time O(nlog^n) in the worst case. Moreover, since every pass takes at least n 
steps, the average complexity using Pratt's increment sequence is 0{n\og n). 
Incerpi and Sedgewick H constructed a family of increment sequences for which 

^ "log" denotes the binary logarithm and "In" denotes the natural logarithm. 



Shellsort runs in 0(n^+^/v '°^") time using (8/e^) logn passes, for every e > 0. 
B. Chazelle (attribution in |Q) obtained the same result by generalizing Pratt's 
method: instead of using 2 and 3 to construct the increment sequence use a and 
(a + 1) for fixed a which yields a worst-case running time of nlog ri(a^/ In a) 

which is 0(ni+'/Vi°s") for In^ a = O(logn). Plaxton, Poonen and Suel |ll| 
proved an i7(n^"'"'^/v^) lower bound for p passes of Shellsort using any increment 
sequence, for some e > 0; taking p = f2{\ogn) shows that the Incerpi-Sedgewick 
/ Chazelle bounds are optimal for small p and taking p slightly larger shows a 
0(71 log^n/(log log n)^) lower bound on the worst case complexity of Shellsort. 
Since every pass takes at least n steps this shows an J7(n log^ n/{\og log n)^) lower 
bound on the worst-case of every Shellsort increment sequence. For the average- 
case running time Knuth m showed ©(n^^'^) for the best choice of increments in 
p = 2 passes; Yao U% analyzed the average case for p = 3 but did not obtain a 
simple analytic form; Yao's analysis was improved by Janson and Knuth 0] who 
showed 0(n^^/^^) average-case running time for a particular choice of increments 
in p = 3 passes. Apart from this no nontrivial results are known for the average 
case; see |7|0,|ll. 



Results: We show a general ^(pn^+^Z^) lower bound on the average-case run- 
ning time of p-pass Shellsort under uniform distribution of input permutations 
for every p. g This is the first advance on the problem of determining general non- 
trivial bounds on the average-case running time of Shellsort [|l2|j7|,p^,p|, |ll|jl^Jl4| . 
Using the same simple method, we also obtain results on the average number of 
stacks or queues (sequential or parallel) required for sorting under the uniform 
distribution on input permutations. These problems have been studied before 
by Knuth and Tarjan ||l^ for the worst case. 

Kolmogorov complexity and the Incompressibility Method: The tech- 
nical tool to obtain our results is the incompressibility method. This method is 
especially suited for the average case analysis of algorithms and machine models, 
whereas average-case analysis is usually more difficult than worst-case analysis 
using more traditional methods. A survey of the use of the incompressibility 
method is [g| Chapter 6, and recent work is [Q. The most spectacular successes 
of the method occur in the computational complexity analysis of algorithms. 

Informally, the Kolmogorov complexity C{x) of a binary string x is the length 
of the shortest binary program (for a fixed reference universal machine) that 
prints X as its only output and then halts H. A string x is incompressible if 
C{x) is at least \x\, the approximate length of a program that simply includes 
all of X literally. Similarly, the conditional Kolmogorov complexity of x with 
respect to y, denoted by C(x\y), is the length of the shortest program that, with 
extra information y, prints x. A string x is incompressible relative to y if C{x\y) 
is large in the appropriate sense. For details see M. Here we use that, both 
absolutely and relative to any fixed string y, there are incompressible strings of 



^ The trivial lower bound is n{pn) comparisons since every element needs to be com- 
pared at least once in every pass. 



every length, and that most strings are nearly incompressible, by any standard. 
H Another easy one is that significantly long subwords of an incompressible string 
are themselves nearly incompressible by any standard, even relative to the rest 
of the string. In the sequel we use the following easy facts (sometimes only 
implicitly) . 

Lemma 1. Let c be a positive integer. For every fixed y, every finite set A 
contains at least (1 — 2^'^)|A| + 1 elements x with C{x\A,y) > \log\A\\ — c. 

Lemma 2. If A is a set, then for every y every element x €z A has complexity 
C{x\A,y)<\og\A\+0{l). 

The first lemma is proved by simple counting. The second lemma holds since x 
can be described by first describing A in 0(1) bits and then giving the index of 
x in the enumeration order of A. 



2 Shellsort 

A Shellsort computation consists of a sequence comparison and inversion (swap- 
ping) operations. In this analysis of the average-case lower bound we count just 
the total number of data movements (here inversions) executed. The same bound 
holds for number of comparisons automatically. The average is taken over the 
uniform distribution of all lists of n items. 

The proof is based on the following intuitive idea: There are n! different 
permutations. Given the sorting process (the insertion paths in the right order) 
one can recover the correct permutation from the sorted list. Hence one requires 
n! pairwise different sorting processes. This gives a lower bound on the minimum 
of the maximal length of a process. We formulate the proof in the crisp format 
of incompressibility. 

Theorem 1. The average number of inversions in p-pass Shellsort on lists of n 
keys is at least Q ypn^'^^i'P) for every increment sequence. 

Proof. Let the list to be sorted consist of a permutation tt of the elements 
1, . . . ,n. Consider a (/ii, . . . , hp) Shellsort algorithm A where hk is the incre- 
ment in the fcth pass and hp — I. For any 1 < i < n and 1 < k < p, let rui^k be 
the number of elements in the hk- chain containing element i that are to the left 
of i at the beginning of pass k and are larger than i. Observe that Yl'i=i '>T^i,k 
is the number of inversions in the initial permutation of pass k, and that the 



^ By a simple counting argument one can show that whereas some strings can be 
enormously compressed, like strings of the form 11 ... 1, the majority of strings can 
hardly be compressed at all. For every n there are 2" binary strings of length n, but 
only X]"_Q 2' = 2" — 1 possible shorter descriptions. Therefore, there is at least one 
binary string x of length n such that C{x) > n. Similarly, for every length n and any 
binary string y, there is a binary string x of length n such that C{x\y) > n. 



insertion sort in pass k requires precisely X]"=i("^j,fc + 1) comparisons. Let M 
denote the total number of inversions: 

P 71 

fc=i i=i 

Claim. Given all the rrii^k^s in an appropriate fixed order, we can reconstruct 
the original permutation tt. 

Proof. The rrii^p^s trivially specify the initial permutation of pass p. In general, 
given the jtij^^'s and the final permutation of pass fc, we can easily reconstruct 
the initial permutation of pass k. D 

Let M as in Iw^ be a fixed number. Let permutation tt be an incompressible 
permutation having Kolmogorov complexity 

C(7r|n,A,P) >logn!-logn. (2) 

where P is the encoding program in the following discussion. The description in 
Claim is effective and therefore its minimum length must exceed the complexity 
of tt: 

C(mi,i, . . . , m„,p|n. A, P) > C(7r|n, A, P). (3) 

Any M as defined by IW such that every division of M in ra.i^kS contradicts (0) 
would be a lower bound on the number of inversions performed. There are 

possible divisions of M into np nonnegative integral summands Tn^jt's. Every 
division can be indicated by its index j in an enumeration of these divisions. 
Therefore, a self-delimiting description of M followed by a description of j ef- 
fectively describes the rrii^kS. The length of this description must by definition 
exceed its Kolmogorov complexity. That is, 

log D{M) + log M + 2 log log M > C(mi4, . . . , m„Jn, A, P) + 0(1). 

We know that M < pn^ since every rrii^k < n. We can assumeg p < n. Together 
with (H) and (ra), we have 

logi:>(A/r) >logn!-41ogn + 0(l). (5) 

By (|) log D(Af) is bounded above by 

, /M + np-l\ , ,, A/ + np-l ,^^ M + np - \ 
log "^ ^ \^ {np-\)\og — — +Mlog- 



np —\ ) np — 1 M 



* Otherwise we require at least n^ comparisons. 
^ Use the following formula (§, p. 10), 



log M = Mog - + (a - 6) log ^-^ + - log ^^— ^ + OH). 



1 M + np - 1 

By (H) we have M -^ oo for n — > cxd. Therefore, the second term in the right-hand 
side equals 

/ - 1 \ *^ 

log(l + ^j -loge"^-^ 

for n — > oo. Since < p < n and n < M < prfi, 

1 M + np-l 

■ log -^ 7TTT -^ 



2(np - 1) ^ (np - 1)M 
for n — > oo. Therefore, the total right-hand side goes to 

(np-l) hog(^-— ^ + lj +loge 

for n — > oo. Together with (H) this yields 

Therefore, the running time of the algorithm is as stated in the theorem for 
every permutation n satisfying (g). By lemma |l| at least a (1 — l/n)-fraction of 
all permutations n require that high complexity. Therefore, the following is a 
lower bound on the expected number of inversions of the sorting procedure: 

(1 - -)nipn^+^^P) + -[2(0) = f2{pn^+^/P) 
n n 

This gives us the theorem. D 

Compare our lower bound on the average-case with the Plaxton-Poonen-Suel 
i7(n^+^' ^) worst case lower bound ||ll|. Some special cases of the lower bound 
on the average-case complexity are: 

1. When p = 1, this gives asymptotically tight bound for the average number 
of inversions for Insertion Sort. 

2. When p = 2, Shellsort requires J7(n'^") inversions (the tight bound is known 
to be 6)(n5/3) ^)- 

3. When p = 3, Shellsort requires n{n'^'^) inversions (the best known upper 
bound is 0(n23/i5) in g); 

4. When p — log n/ log log n, Shellsort requires fi{n log n/ log log n) inversions; 

5. When p — logn, Shellsort requires f2{n\ogn) inversions. When we consider 
comparisons, this is of course the lower bound of average number of compar- 
isons for every sorting algorithm. 

6. In general, when p = p{n) > logn, Shellsort requires n{n ■ p{n)) inversions 
(it requires that many comparisons anyway since every pass trivially makes 
n comparisons). 

In um it is mentioned that the existence of an increment sequence yielding an 
average 0(n log n) Shellsort has been open for 30 years. The above lower bound 
on the average shows that the number p of passes of such an increment sequence 
(if it exists) is precisely p = 0{\ogn); all the other possibilities are ruled out. 



3 Sorting with Queues and Stacks 

Knuth m and Tarjan pq have studied the problem of sorting using a network of 
queues or stacks. In particular, the main variants of the problem are: assuming 
the stacks or queues are arranged sequentially or in parallel, how many stacks 
or queues are needed to sort n numbers. Here, the input sequence is scanned 
from left to right. We will concentrate on the average-case analyses of the above 
two main variants, although our technique in general apply to arbitrary acyclic 
networks of stacks and queues as studied in [|l6| . 

3.1 Sorting with Sequential Stacks 

The sequential stack sorting problem is in [Q exercise 5.2.4-20. We have k stacks 
numbered Sq, ■ ■ ■ , Sk-i arranged sequentially from right to left. The input is a 
permutation n of the elements 1, . . . , n. Initially we feed the elements of n to 5*0 
at most one at a time in the order in which they appear in tt. At every step we 
can pop a stack (the popped elements will move to the left) or push an incoming 
element on a stack. The question is how many stack are needed for sorting tt. 
It is known that k = logn stacks suffice, and ^ logn stacks are necessary in the 
worst-case f^jQ- Here we prove that the same lower bound also holds on the 
average with a very simple incompressibility argument. 

Theorem 2. On the average, at least |logn stacks are needed for sequential 
stack sort. 

Proof. Fix an incompressible permutation vr such that 

C(7r|n, P) < logn! — log = nlogn — O(logn), 

where P is an encoding program to be specified in the following. 

Assume that k stacks is sufficient to sort n. We now encode such a sorting 
process. For every stack, exactly n elements pass through it. Hence we need 
perform precisely n pushes and n pops on every stack. Encode a push as and 
a pop as 1. It is easy to prove that different permutations must have different 
push/pop sequences on at least one stack. Thus with 2kn bits, we can completely 
specify the input permutation tt. n Then, as before, 

2kn > \ognl — logn = nlogn — O(logn). 

Hence, approximately k > -^ logn for incompressible permutations tt. 

Since most (a (1 — l/n)th fraction) permutations are incompressible, we can 
calculate the average-case lower bound as: 

1 , n- 1 1 1 , 

-logn hi • - w ^logn. 

2 n n 2 

a 



In fact since each stack corresponds to precisely n pushes and n pops where the 
pushes and pops form a "balanced" string, the Kolmogorov complexity of such a 
sequence is at most g{n) := 2n — | logn -I- 0(1) bits. So 2kg{n) bits would suffice to 
specifiy the input permutation. But this does not yield a nontrivial improvement. 



3.2 Sorting with Parallel Stacks 

Clearly, the input sequence 2, 3, 4, . . . , n, 1 requires n — 1 parallel stacks to sort. 
Hence the worst-case complexity of sorting with parallel stacks is n— 1. However, 
most sequences do not need these many stacks to sort in the parallel arrange- 
ment. The next two theorems show that on the average, 6*(-y/n) stacks are both 
necessary and sufRcient. Observe that the result is actually implied by the con- 
nection between sorting with parallel stacks and longest increasing subsequences 
given in Wa and the bounds on the length of longest increasing subsequences of 
random permutations given in, |^,^|J4J. However, the proofs in [^J9[^ use deep re- 
sults from probability theory (such as Kingman's ergodic theorem) and are quite 
sophisticated. Here we give simple proofs using incompressibility arguments. 

Theorem 3. On the average, the number of parallel stacks needed to sort n 
elements is 0{y/n). 

Proof. Consider an incompressible permutation tt satisfying 

C(7r|n) > logn! — logn. (6) 

We use the following trivial algorithm (which is described in pq|) to sort tt with 
stacks in the parallel arrangement . Assume that the stacks are named S'o, 5*1, . . . 
and the input sequence is denoted as xi, . . . , a;„. 

Algorithm Parallel-Stack-Sort 

1. For i = 1 to n do 

Scan the stacks from left to right, and push Xi on the the first stack Sj 
whose top element is larger than Xi. If such a stack doesn't exist, put Xi 
on the first empty stack. 

2. Pop the stacks in the ascending order of their top elements. 

We claim that algorithm Parallel-Stack-Sort uses 0{y/n) stacks on the per- 
mutation TT. First, we observe that if the algorithm uses m stacks on n then we 
can identify an increasing subsequence of tt of length m as in |16| . This can be 
done by a trivial backtracing starting from the top element of the last stack. 
Then we argue that tt cannot have an increasing subsequence of length longer 
than By/n, where e is the natural constant, since it is compressible by at most 
logn bits. 

Suppose that ct is a longest increasing subsequence of tt and m — \a\ is the 
length of a. Then we can encode tt by specifying: 

1. a description of this encoding scheme in 0(1) bits; 

2. the number m in \ogm bits; 

3. the combination a in log (^) bits; 

4. the locations of the elements of cr in tt in at most log (^) bits; and 

5. the remaining n with the elements of a deleted in log(n — to)! bits. 



This takes a total of 

71: 

log(n — m)! + 21og — - — '■ — + logm + 0(1) + 21oglogm 

ra\[n — ray. 

bits. Using Stirling approximation and the fact that ^/n < m — o(n), we the 
above expression is upper bounded by: 

(n/e)" 

log n\ + log , , .,„,' ' ' .. .„_„^ + 0(log n) 

(m/e)^"'((n — mj/ej" ™ 

Ti Tl 

« log n! + 771 log — 7T + (77 — 777) log h 777 log e + Oflog 77) 

Tl 

« log nl + m log — 7T + 2m log e + 0(log 77) 
777^ 

This description length must exceed the complexity of the permutation which 
is lower-bounded in (^. This requires that (approximately) 777 < ey/n = 0{y/n). 
This yields an average complexity of Parallel-Stack-Sort of: 

77 — 1 1 

0(01) +77--=0(V^). 

77 77 

D 



Theorem 4. On the average, the number of parallel stacks required to sort a 
permutation is f2(y/n). 

Proof. Let A be any sorting algorithm using parallel stacks. Fix an incompress- 
ible permutation n with C(TT\n, P) > log 77! — log 77, where P is the program to do 
the encoding discussed in the following. Suppose that A uses T parallel stacks 
to sort vr. This sorting process involves a sequence of moves, and we can encode 
this sequence of moves as a sequence of the following items: "push to stack i" 
and "pop stack j" , where the element to be pushed is the next unprocessed el- 
ement from the input sequence and the popped element is written as the next 
output element. Each of these term requires logT bits. In total, we use 277 terms 
precisely since every element has to be pushed once and popped once. Such a 
sequence is unique for every permutation. 

Thus we have a description of an input sequence with length 277 log T bits, 
which must exceed C(7r|77, P) > 77 log 77 — O(logn). It follows that T > ^fn = 
fl{y/n). This yields the average-case complexity of A: 

77 — 1 1 

^(\/^) + 1 • - = ^(V^). 

77 77 

D 



3.3 Sorting with Parallel Queues 

It is easy to see that sorting cannot be done witli a sequence of queues. So we 
consider the complexity of sorting with parallel queues. It turns out that all the 
result in the previous subsection also hold for queues. 

As noticed in Q , the worst-case complexity of sorting with parallel queues 
is n since the input sequence n, n — 1, . . . , 1 requires n queues to sort. We show 
in the next two theorems that on the average, 0{y/n) queues are both necessary 
and sufficient. Again, the result is implied by the connection between sorting 
with parallel queues and longest decreasing subsequences given in |16| and the 
bounds in |^,||4| (with sophisticated proofs) . Our proofs are almost trivial given 
the proofs in the previous subsection. 

Theorem 5. On the average, the number of parallel queues needed to sort n 
elements is upper bounded by 0{y^). 

Proof. The proof is very similar to the proof of Theorem & We use a slightly 
modified greedy algorithm as described in n&: 

Algorithm Parallel- Queue- Sort 

1. For i = I to n do 

Scan the queues from left to right, and append Xi on the the first queue 
whose rear element is smaller than Xi. If such a queue doesn't exist, put 
Xi on the first empty queue. 

2. Delete the front elements of the queues in the ascending order. 

Again, we can claim that algorithm Parallel-Queue-Sort uses 0{^/n) queues 
on any permutation tt that cannot be compressed by more than logn bits. We 
first observe that if the algorithm uses m queues on tt then a decreasing subse- 
quence of TT of length m can be identified, and we then argue that n cannot have 
a decreasing subsequence of length longer than e-^/n, in a way analogous to the 
argument in the proof of Theorem ^ D 

Theorem 6. On the average, the number of parallel queues required to sort a 
permutation is fi{y/n). 

Proof. The proof is the same as the one for Theorem || except that we should 
replace "push" with "enqueue" and "pop" with "dequeue". D 



4 Conclusion 

The incompressibility method is a good tool to analyzing the average-case com- 
plexity of sorting algorithms. Simplicity has been our goal. Examples of such 
average-case analyses of some other algorithms are given in |l| . This methodol- 
ogy and applications can be easily taught to undergraduate students. 



The average-case performance of Shellsort has been one of the most funda- 
mental and interesting open problems in the area of algorithm analysis. The 
simple average-case analysis of Insertion Sort (1-pass Shellsort), stack-sort and 
queue-sort are further examples to demonstrate the generality and simplicity of 
our technique in analyzing sorting algorithms in general. Some open questions 
are: 

1. Tighten the average-case lower bound for Shellsort. Our bound is not tight 
for p = 2 passes. 

2. For sorting with sequential stacks, can we close the gap between logn upper 
bound and the 2 log n lower bound? 
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